Distributed statistical inference for massive data

نویسندگان

چکیده

This paper considers distributed statistical inference for general symmetric statistics in the context of massive data with efficient computation. Estimation efficiency and asymptotic distributions are provided, which reveal different results between nondegenerate degenerate cases, show number subsets plays an important role. Two bootstrap methods proposed analyzed to approximation underlying distribution improved computation over existing methods. The accuracy distributional by studied theoretically. One methods, pseudo-distributed bootstrap, is particularly attractive if datasets large as it directly resamples subset-based statistics, assumes less stringent conditions its performance can be studentization.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Sampling Storage for Statistical Analysis of Massive Sensor Data

Cyber-physical systems interconnect the cyber world with the physical world in which sensors are massively networked to monitor the physical world. Various services are expected to be able to use sensor data reflecting the physical world with information technology. Given this expectation, it is important to simultaneously provide timely access to massive data and reduce storage costs. We propo...

متن کامل

Scalable Algorithms for Distributed Statistical Inference

The classical framework on distributed inference considers a set of nodes taking measurements and a fusion center making the final decision on the underlying phenomenon, without dealing with the issue of transporting the measurements to the fusion center. Such an approach introduces significant overhead in communication. Communicating all the raw data for inference is not scalable: in this case...

متن کامل

Approximated Bayesian Inference for Massive Streaming Data

Extracting meaningful information out of massive streaming data is a significant challenge due to the high dimensionality of the inference problem and limits on available computational power and memory. While Bayesian models often convey significant inferential advantages, standard computational algorithms relying on Markov chain Monte Carlo are infeasible to apply. This motivates online variat...

متن کامل

Communication-Efficient Distributed Statistical Inference

We present a Communication-efficient Surrogate Likelihood (CSL) framework for solving distributed statistical inference problems. CSL provides a communication-efficient surrogate to the global likelihood that can be used for low-dimensional estimation, high-dimensional regularized estimation and Bayesian inference. For low-dimensional estimation, CSL provably improves upon naive averaging schem...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Annals of Statistics

سال: 2021

ISSN: ['0090-5364', '2168-8966']

DOI: https://doi.org/10.1214/21-aos2062